最近,使用批评者分配表示截断的分量批评者(TQC),显示在Mujoco连续控制基准套件的所有环境中提供最先进的渐近培训表现。此外,使用高更新到数据比和目标随机化的随机集合双Q学习(REDQ)达到了具有基于最先进的模型的方法竞争的高样本效率。在本文中,我们提出了一种新的无模型算法,具有集合(AQE)的激进Q学习,这提高了REDQ的样品效率性能和TQC的渐近性能,从而提供了整体最先进的性能在培训的所有阶段。此外,AQE非常简单,要求批评者的分布表示也不是目标随机化。
translated by 谷歌翻译
在强化学习中,蒙特卡洛算法通过平均偶发回报来更新Q功能。在Monte Carlo UCB(MC-UCB)算法中,在每个状态下采取的动作是最大化Q函数加上UCB勘探项的动作,该术语偏向于选择频率较低的动作的选择。尽管在为MC-UCB建立遗憾界限方面已经进行了重要的工作,但大多数工作都集中在该问题的有限培训版本上,每个情节都在不断数量的步骤后终止。对于此类有限的Horizo​​n问题,最佳策略既取决于当前状态和情节中的时间。但是,对于许多自然的情节问题,例如GO,CHESS和机器人任务等游戏,该情节是随机的,最佳政策是静止的。对于此类环境,MC-UCB中的Q功能是否会收敛到最佳Q函数,这是一个空旷的问题。我们猜想,与Q学习不同,它并不是所有MDP的收敛。尽管如此,我们表明,对于大型MDP,其中包括二十一点和确定性MDP等随机MDP,例如GO,MC-UCB中的Q功能几乎可以肯定地收敛到最佳Q函数。该结果的直接推论是,它几乎肯定会为所有有限的Horizo​​n MDP收敛。我们还提供了数值实验,为MC-UCB提供了进一步的见解。
translated by 谷歌翻译
在这项研究中,将放射学方法扩展到用于组织分类的光学荧光分子成像数据,称为“验光”。荧光分子成像正在出现在头颈部鳞状细胞癌(HNSCC)切除期间的精确手术引导。然而,肿瘤到正常的组织对比与靶分子表皮生长因子受体(EGFR)的异质表达的内在生理局限性混淆。验光学试图通过探测荧光传达的EGFR表达中的质地模式差异来改善肿瘤识别。从荧光图像样品中提取了总共1,472个标准化的验光特征。涉及支持矢量机分类器的监督机器学习管道接受了25个顶级功能的培训,这些功能由最小冗余最大相关标准选择。通过将切除组织的图像贴片分类为组织学确认的恶性肿瘤状态,将模型预测性能与荧光强度阈值方法进行了比较。与荧光强度阈值方法相比,验光方法在所有测试集样品中提供了一致的预测准确性(无剂量)(平均精度为89%vs. 81%; P = 0.0072)。改进的性能表明,将放射线学方法扩展到荧光分子成像数据为荧光引导手术中的癌症检测提供了有希望的图像分析技术。
translated by 谷歌翻译
多实施学习(MIL)被广泛用于对病理整体幻灯片图像(WSIS)的计算机辅助解释,以解决缺乏像素或贴片的注释。通常,这种方法直接应用“自然图像驱动”的MIL算法,该算法忽略了WSIS的多尺度(即金字塔)性质。现成的MIL算法通常部署在单个WSIS(例如20x放大倍率)上,而人类病理学家通常以多尺度的方式汇总全球和局部模式(例如,通过放大不同大型)。在这项研究中,我们提出了一种新型的跨尺度注意机制,以明确地将尺度间相互作用汇总到单个MIL网络的克罗恩病(CD)(CD),这是炎症性肠病的一种形式。本文的贡献是两个方面:(1)提出了一种跨尺度注意机制,以从不同分辨率的多尺度相互作用汇总特征; (2)生成差异多尺度注意的可视化,以定位可解释的病变模式。通过训练来自20名CD患者的约250,000 H&E染色的上升结肠(AC)斑块,在不同尺度上训练30个健康对照样品,我们的方法在曲线下(AUC)得分为0.8924,与基线模型相比达到0.8924。官方实施可在https://github.com/hrlblab/cs-mil上公开获得。
translated by 谷歌翻译
我们提出了VRL3,这是一个强大的数据驱动框架,其简单设计用于解决挑战性的视觉深度强化学习(DRL)任务。我们分析了采用数据驱动方法的许多主要障碍,并提出了一系列设计原理,新颖的发现以及有关数据驱动的视觉DRL的关键见解。我们的框架有三个阶段:在第1阶段,我们利用非RL数据集(例如ImageNet)学习任务无关的视觉表示;在第2阶段,我们使用离线RL数据(例如,专家演示数量有限)将任务不合时宜的表示转换为更强大的特定任务表示;在第3阶段,我们用在线RL微调了代理商。与以前的SOTA相比,在一系列具有稀疏奖励和现实视觉输入的具有挑战性的手动操纵任务上,VRL3平均达到了780%的样本效率。在最艰巨的任务上,VRL3的样本有效效率高1220%(使用较宽的编码器时2440%),仅使用计算的10%来解决任务。这些重要的结果清楚地表明了数据驱动的深度强化学习的巨大潜力。
translated by 谷歌翻译
一种简单自然的增强学习算法(RL)是蒙特卡洛探索开始(MCES),通过平均蒙特卡洛回报来估算Q功能,并通过选择最大化Q当前估计的行动来改进策略。 -功能。探索是通过“探索开始”来执行的,即每个情节以随机选择的状态和动作开始,然后遵循当前的策略到终端状态。在Sutton&Barto(2018)的RL经典书中,据说建立MCES算法的收敛是RL中最重要的剩余理论问题之一。但是,MCE的收敛问题证明是非常细微的。 Bertsekas&Tsitsiklis(1996)提供了一个反例,表明MCES算法不一定会收敛。 TSITSIKLIS(2002)进一步表明,如果修改了原始MCES算法,以使Q-功能估计值以所有状态行动对以相同的速率更新,并且折现因子严格少于一个,则MCES算法收敛。在本文中,我们通过Sutton&Barto(1998)中给出的原始,更有效的MCES算法取得进展政策。这样的MDP包括大量的环境,例如所有确定性环境和所有具有时间步长的情节环境或作为状态的任何单调变化的值。与以前使用随机近似的证据不同,我们引入了一种新型的感应方法,该方法非常简单,仅利用大量的强规律。
translated by 谷歌翻译
Making histopathology image classifiers robust to a wide range of real-world variability is a challenging task. Here, we describe a candidate deep learning solution for the Mitosis Domain Generalization Challenge 2022 (MIDOG) to address the problem of generalization for mitosis detection in images of hematoxylin-eosin-stained histology slides under high variability (scanner, tissue type and species variability). Our approach consists in training a rotation-invariant deep learning model using aggressive data augmentation with a training set enriched with hard negative examples and automatically selected negative examples from the unlabeled part of the challenge dataset. To optimize the performance of our models, we investigated a hard negative mining regime search procedure that lead us to train our best model using a subset of image patches representing 19.6% of our training partition of the challenge dataset. Our candidate model ensemble achieved a F1-score of .697 on the final test set after automated evaluation on the challenge platform, achieving the third best overall score in the MIDOG 2022 Challenge.
translated by 谷歌翻译
Supervised Question Answering systems (QA systems) rely on domain-specific human-labeled data for training. Unsupervised QA systems generate their own question-answer training pairs, typically using secondary knowledge sources to achieve this outcome. Our approach (called PIE-QG) uses Open Information Extraction (OpenIE) to generate synthetic training questions from paraphrased passages and uses the question-answer pairs as training data for a language model for a state-of-the-art QA system based on BERT. Triples in the form of <subject, predicate, object> are extracted from each passage, and questions are formed with subjects (or objects) and predicates while objects (or subjects) are considered as answers. Experimenting on five extractive QA datasets demonstrates that our technique achieves on-par performance with existing state-of-the-art QA systems with the benefit of being trained on an order of magnitude fewer documents and without any recourse to external reference data sources.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
While the brain connectivity network can inform the understanding and diagnosis of developmental dyslexia, its cause-effect relationships have not yet enough been examined. Employing electroencephalography signals and band-limited white noise stimulus at 4.8 Hz (prosodic-syllabic frequency), we measure the phase Granger causalities among channels to identify differences between dyslexic learners and controls, thereby proposing a method to calculate directional connectivity. As causal relationships run in both directions, we explore three scenarios, namely channels' activity as sources, as sinks, and in total. Our proposed method can be used for both classification and exploratory analysis. In all scenarios, we find confirmation of the established right-lateralized Theta sampling network anomaly, in line with the temporal sampling framework's assumption of oscillatory differences in the Theta and Gamma bands. Further, we show that this anomaly primarily occurs in the causal relationships of channels acting as sinks, where it is significantly more pronounced than when only total activity is observed. In the sink scenario, our classifier obtains 0.84 and 0.88 accuracy and 0.87 and 0.93 AUC for the Theta and Gamma bands, respectively.
translated by 谷歌翻译